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ABSTRACT 

Summary: Genome-wide association studies (GWAS) have identi- 
fied thousands of human trait-associated single nucleotide polymor- 
phisms. Here, I describe a freely available R package for visualizing 
GWAS results using Q-Q and manhattan plots. The qqman package 
enables the flexible creation of manhattan plots, both genome-wide 
and for single chromosomes, with optional highlighting of SNPs of 
interest. 

Availability: qqman is released under the GNU General Public Li- 
cense, and is freely available on the Comprehensive R Archive Net- 
work ( http://cran.r-project.orq/packaqe=qqman ). The source code is 
available on GitHub ( https://qithub.com/stephenturner/qqman ). 
Contact: turner@virginia.edu 



1 INTRODUCTION 

Genome-wide association studies (GWAS) have been successful in 
identifying thousands of trait and disease-associated single nucleo- 
tide polymorphisms (SNPs) (Welter et al., 2014). A typical GWAS 
will examine hundreds of thousands to millions of polymorphic 
variants in many samples, testing their statistical association with 
discrete outcomes (a "case-control study," such as diabetic vs. non- 
diabetic) and/or continuous outcomes (quantitative traits such as 
height, lipid levels, or expression level of a gene). The primary 
result of this statistical analysis is a list of SNPs, their associated 
chromosomal position, and a P-value representing the statistical 
significance of the association. 

A commonly used method used to visualize GWAS results is the 
"manhattan plot" - a plot of the -log l0 (P- value) of the association 
statistic on the y-axis versus the chromosomal position of the SNP 
on the x-axis. Regions with many highly associated SNPs in link- 
age disequilibrium appear as "skyscrapers" along the plot. Another 
commonly used results diagnostic plot is the quantile-quantile ("Q- 
Q") plot. Q-Q plots display the observed association P-value for all 
SNPs on the ;y-axis versus the expected uniform distribution of P- 
values under the null hypothesis of no association on the x-axis. 
Strongly associated SNPs will deviate from the diagonal at the 
upper-right end of the plot, while systematic deviation from the 
diagonal may indicate problems with the data, such as population 
stratification or cryptic relatedness. 
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One of the most commonly used software packages for manipu- 
lating and analyzing GWAS data is PLINK (Purcell et al., 2007). 
Here, I describe an R package that allows for quick and flexible 
generation of publication-ready Q-Q and manhattan plots directly 
from PLINK results files. 

2 IMPLEMENTATION & EXAMPLE USAGE 

The qqman software is developed as a package for the R statistical compu- 
ting environment, and is released under the GNU General Public License 
on the Comprehensive R Archive Network (CRAN). The qqman package 
ships with an example set of GWAS results called gwasRe suits in a 
format similar to the output from plink --assoc that will be used in 
the examples below. For more examples, please view the package vignette. 

The manhattan ( ) function in the qqman package takes a data frame 
with columns containing the chromosome number, chromosomal position, 
P-value, and optionally the SNP name. By default, manhattan ( ) looks 
for column names corresponding to those output by the plink --assoc 
command, namely, "CHR," "BP," "P," and "SNP," although different 
column names can be specified by the user. Calling the manhattan () 
function with a data frame of GWAS results as the single argument draws a 
basic manhattan plot, defaulting to a black-and-white color scheme: 

# Load pkg, view data, draw basic manhattan plot 
library (qqman) 

head (gwasResults ) 
manhattan (gwasResults) 

Further graphical parameters can be passed to the manhattan ( ) func- 
tion to control things like point character, size, colors, etc. When run with 
defaults, the manhattan ( ) function draws horizontal lines drawn at 
—log\o(\ x 10~ 5 ) for "suggestive" associations and -/ogio(5x 10 8 ) for the 
"genome-wide significant" threshold. These can be moved to different 
locations or turned off completely with the suggestiveline and 
genomewideline arguments, respectively. 

# Reduce point size to 80% 
manhattan (gwasResults , cex=0.8) 

# Change colors to alternating blue and orange 
manhattan (gwasResults , col=c ( "blue4 " , "orange3")) 

# Turn off the suggestive line at le-5: 
manhattan (gwasResults, suggestiveline=FALSE ) 

The behavior of the manhattan ( ) function changes slightly when re- 
sults from only a single chromosome are used. Here, instead of plotting 
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alternating colors and chromosome number on the x-axis, the SNP's posi- 
tion on that chromosome is plotted on the x-axis: 

# Plot only results on chromosome 3 
chr3 <- subset (gwasResults , chr==3) 
manhattan ( chr 3 ) 

The manhattan ( ) function has the ability to highlight SNPs of inter- 
est. Here, a character vector containing SNP names to highlight. The 
qqman package ships with an example set of "interesting" SNPs called 
snpsOf Interest, used in the example below. 

# View and highlight SNPs of interest 
head (snpsOf Interest) 

manhattan (gwasResults, highlight=snpsOf Interest) 

Additionally, highlighting SNPs of interest can be combined with limit- 
ing to a single chromosome to enable "zooming" in to a particular region 
containing SNPs of interest: 

# View and highlight interesting SNPs on chr3 : 
chr3 <- subset (gwasResults , chr==3) 
manhattan (chr3, highlight=snpsOf Interest) 

Finally, the qq ( ) function can be used to generate a Q-Q plot to visual- 
ize the distribution of association P-values. The qq ( ) function takes a 
vector of P- values as its only required argument. Further graphical parame- 
ters can be passed to qq ( ) to control the plot title, axis labels, point char- 
acters, colors, point/text sizes, etc. 

# Draw a Q-Q plot with a title 

qq (gwasResults$P, main="Q-Q plot of P-values") 



into existing R-based scripted workflows to further enable auto- 
mated reproducible research. Furthermore, users can take ad- 
vantage of R's very granular control of graphical output, enabling 
a high degree of customizability in creating high-resolution, publi- 
cation-ready figures. 
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Fig. 1. Manhattan plot highlighting SNPs of interest on chromosome 3, 
with Q-Q plot showing substantial deviation from the diagonal (inset). 



3 CONCLUSIONS 

The qqman package is a user-friendly tool to visualize results 
from GWAS experiments using Q-Q and manhattan plots. These 
graphics can be created in other software, such as the standalone 
desktop software Haploview (Barrett, Fry, Mailer, & Daly, 2005), 
or for focused regions using the web-based application LocusZoom 
(Pruim et al., 2010). Conversely, qqman is distributed as an R 
package with no other dependencies that can be easily integrated 
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